FlexPS: Flexible Parallelism Control in Parameter Server Architecture

نویسندگان

  • Yuzhen Huang
  • Tatiana Jin
  • Yidi Wu
  • Zhenkun Cai
  • Xiao Yan
  • Fan Yang
  • Jinfeng Li
  • Yuying Guo
  • James Cheng
چکیده

As a general abstraction for coordinating the distributed storage and access of model parameters, the parameter server (PS) architecture enables distributed machine learning to handle large datasets and high dimensional models. Many systems, such as Parameter Server and Petuum, have been developed based on the PS architecture and widely used in practice. However, none of these systems supports changing parallelism during runtime, which is crucial for the e cient execution of machine learning tasks with dynamic workloads. We propose a new system, called FlexPS, which introduces a novel multi-stage abstraction to support flexible parallelism control. With the multi-stage abstraction, a machine learning task can be mapped to a series of stages and the parallelism for a stage can be set according to its workload. Optimizations such as stage scheduler, stageaware consistency controller, and direct model transfer are proposed for the e ciency of multi-stage machine learning in FlexPS. As a general and complete PS systems, FlexPS also incorporates many optimizations that are not limited to multi-stage machine learning. We conduct extensive experiments using a variety of machine learning workloads, showing that FlexPS achieves significant speedups and resource saving compared with the state-of-the-art PS systems such as Petuum and Multiverso. PVLDB Reference Format: Yuzhen Huang, Tatiana Jin, Yidi Wu, Zhenkun Cai, Xiao Yan, Fan Yang, Jinfeng Li, Yuying Guo, James Cheng. FlexPS: Flexible Parallelism Control in Parameter Server Architecture. PVLDB, 11(5): 5 57 , 2018. DOI: https://doi.org/10.1145/3177732.3177734

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Workflow Engine with Multi-Level Parallelism Supports

This paper presents the SWFL workflow engine, a general workflow framework that meets the needs of business processes as well as scientific computing processes with fine multi-level parallelism supports. The workflow description language, SWFL, follows a graph-oriented model to specify workflow processes composed of services. The workflow engine provides an efficient enactment environment for S...

متن کامل

Scalable and Flexible heterogeneous multi-core system

Multi-core system has wide utility in today’s applications due to less power consumption and high performance. Many researchers are aiming at improving the performance of these systems by providing flexible multi-core architecture. Flexibility in the multi-core processors system provides high throughput for uniform parallel applications as well as high performance for more general work. This fl...

متن کامل

Development of a Flexible PERMIS Authorisation Module for Shibboleth and Apache Server

This paper describes the development of a flexible Role Based Access Control (RBAC) authorisation module – the Shibboleth and Apache Authorisation Module (SAAM) which is based on the PERMIS privilege management infrastructure. It explains how the module can work with the Apache web server, with or without Shibboleth. We argue that this can effectively improve the level of trust and flexibility ...

متن کامل

Maestro: A System for Scalable OpenFlow Control

The fundamental feature of an OpenFlow network is that the controller is responsible for the initial establishment of every flow by contacting related switches. Thus the performance of the controller could be a bottleneck. This paper shows how this fundamental problem is addressed by parallelism. The state of the art OpenFlow controller, called NOX, achieves a simple programming model for contr...

متن کامل

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs

We show empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback). This size reduction makes it feasible to parallelize SGD through data-parallelism with fast processors like recent GPUs. We implement data-par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2018